Refactor translation functionality to use M2M100 model from Hugging Face Transformers (Japanese Translation Support) #63
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This implementation is sufficient for Japanese translation. Here's why:
sequenceDiagram participant Client participant router_py as router.py participant service_py as service.py participant translate_py as translate.py (ModelBasedTranslate) participant langdetect participant transformers as transformers (M2M100 Model) Client->>router_py: POST /rai/v1/moderations (Japanese Prompt) activate router_py router_py->>service_py: getModerationResult(payload) activate service_py alt Language is not English service_py->>translate_py: translator.translate(prompt) activate translate_py translate_py->>langdetect: detect(Japanese Prompt) activate langdetect langdetect-->>translate_py: returns 'ja' deactivate langdetect translate_py->>transformers: Set source lang to 'ja' translate_py->>transformers: Generate translation for 'en' activate transformers transformers-->>translate_py: returns English text deactivate transformers translate_py-->>service_py: "Translated English Text", "ja" deactivate translate_py end service_py-->>service_py: Perform moderation on English text service_py-->>router_py: Moderation Result deactivate service_py router_py-->>Client: JSON Response deactivate router_pyModel Support: The facebook/m2m100_418M model that has been integrated is a multilingual translation model that explicitly supports Japanese among the 100 languages it was trained on.
Language Detection: The [langdetect] library is used to automatically identify the language of the input text. When a user provides a prompt in Japanese, [langdetect] will identify its language code as ja.
Translation Process: The [ModelBasedTranslate] class uses this detected language code (ja) to set the source language for the tokenizer. It then instructs the model to translate the text into English (en), which is the language the moderation guardrails are designed to process.
Therefore, the pipeline is fully equipped to receive Japanese text, translate it to English, and then pass it to the moderation checks, fulfilling the requirements of the feature.
related issue
#21